as a birected graph
as a birected graph with haplotype paths
A snarl is a subgraph bounded by two node sides that are:
A snarl is a subgraph bounded by two node sides that are:
A snarl is a subgraph bounded by two node sides that are:
A snarl is a subgraph bounded by two node sides that are:
A run of consecutive snarls and nodes is called a chain.
Snarls and chains can be nested inside of each other.
The nested relationship of snarls and chains is described by the snarl tree.
Netgraphs are a representation of snarls with their child chains collapsed into a single node
Enumerate alleles for each snarl on a reference path. Before: all paths.
Enumerate alleles for each snarl on a reference path. Now: only haplotypes.
Enumerate alleles for each snarl on a reference path inc. nested snarls.
vg deconstruct)##INFO=<ID=LV,Number=1,Type=Integer,Description="Level in the snarl tree (0=top level)">
##INFO=<ID=PS,Number=1,Type=String,Description="ID of variant corresponding to parent snarl">
##INFO=<ID=AT,Number=R,Type=String,Description="Allele Traversal as path in graph">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1
ref 6 >1>4 GGCAC CTTAG 60 . AT=>1>2>4,>1>3>4;LV=0 GT 0|1
ref 15 >4>9 CCCAGG CCGGTAACTACCGTCACCAGG,CCGGTACGTCA 60 . AT=>4>8>9,>4>5>6>7>8>9,>4>5>7>9;LV=0 GT 1|2
vg deconstruct)##INFO=<ID=LV,Number=1,Type=Integer,Description="Level in the snarl tree (0=top level)">
##INFO=<ID=PS,Number=1,Type=String,Description="ID of variant corresponding to parent snarl">
##INFO=<ID=AT,Number=R,Type=String,Description="Allele Traversal as path in graph">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1
ref 10 >1>7 AAA AAAAAA,AAAA 60 . AT=>1>5>6>7,>1>2>3>4>5>6>7,>1>4>5>6>7;NS=1;LV=0 GT 1|2
vg deconstruct)##INFO=<ID=LV,Number=1,Type=Integer,Description="Level in the snarl tree (0=top level)">
##INFO=<ID=PS,Number=1,Type=String,Description="ID of variant corresponding to parent snarl">
##INFO=<ID=AT,Number=R,Type=String,Description="Allele Traversal as path in graph">
##contig=<ID=ref#0#0,length=56>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2
ref 11 >2>5 CTTAG AAGTC 60 . AT=>2>3>5,>2>4>5;LV=0 GT 0|0 1|.
vg deconstruct)Trick for getting this snarl decomposition to look better (currently only for the distance index):
vg index -j [graph.dist] -w 6
vg giraffeShort reads
Long reads
On the HPRC v2 graph which is x size?
vg graph formats and indexesIndexes
.gbwt (Graph Burrows Wheeler
Transform): haplotype paths.gg (GBWT Graph): node sequences for a
GBWT.dist (Distance Index): snarl
decomposition plus minimum distances.zipcodes: per-node distance
information used by vg giraffe.min (Minimizer Index): minimizers
used by vg giraffe.gcsa (Generalized Compressed Suffix
Array): substring index used by vg map and
vg mpmapGraphs
.gbz (GBWT + GG): the graph induced by
the GBWT.hg (/.vg) (HashGraph):
graph format optimized for speed.pg (/.vg) (PackedGraph):
graph format optimized for space efficiency.xg: older graph format.vg: protobuf-based graph formatvg wiki
vg manpage: https://github.com/vgteam/vg/wiki/vg-manpage
snarls paper doi: 10.1089/cmb.2017.0251
short read giraffe paper doi: 10.1126/science.abg8871
long read giraffe paper doi: 10.1101/2025.09.29.678807